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Many entities identified by the algorithm are not 
phenomena (e.g. thermal, decrease, layer, etc.) 
SWEET as a taxonomy is noisy which limits the 


performance of the identification algorithm 


. 
| | | ) 


Heuristics based entity identification algorithms 
provides mixed results. While key entities identified 
by expert reviewers are identified, where the 
algorithm ranks these entities in importance is not 
sufficient. Additionally, more generic entities are 
typically ranked higher than more relevant entities 
and multiple entities are extracted representing the 
same variable (e.g. optical depth, optical 
depth/thickness, aerosol optical depth, etc.) 
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Step 0: Parse paper for potential entities 
(Phenomena, Datasets, Instruments, Variables) 
Step 1: Weighted heuristic algorithm applied to 
scientific papers to identify entities to build a 
training dataset using existing taxonomies 
(GCMD, CF, SWEET) 

Step 2: Algorithm entity extraction evaluated by 
scientific experts and classified 
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Knowledge Graphs link key entities in a specific 
domain with other entities via relationships. From 
these relationships, researchers can query 
knowledge graphs for probabilistic recommendations 
to infer new knowledge. Scientific papers are an 
untapped resource which knowledge graphs could 
leverage to accelerate research discovery. 
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Top ~/ results make sense scientifically 
Cleaned up SWEET holds the best potential 
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° Steps 3-11: Use results from the heuristic 
algorithm to build training data for deep learning 
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¢ TF/IDF better than total counts 
Brightness temp is ranked higher than in the 
total counts result 
Uncovered errors in paper: “Dust has a higher 
albedo at 12 microns instead of 11°Should be 
temperature, not albedo 
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(Sweet) 
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Improved results and can 


be used for extractions 


¢ GCMD does not differentiate between entity 
types: physical property, phenomena etc. 
| ¢ Emissivity and radiance are important properties 


Figure 1. Overall conceptual diagram. 
Goal: Develop an end-to-end (semi) automated 
methodology for constructing Knowledge Graphs for 
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